MULTIPARENTAL POPULATIONS Multiple Quantitative Trait Analysis Using Bayesian Networks

نویسندگان

  • Marco Scutari
  • Phil Howell
  • David J. Balding
  • Ian Mackay
چکیده

Models for genome-wide prediction and association studies usually target a single phenotypic trait. However, in animal and plant genetics it is common to record information on multiple phenotypes for each individual that will be genotyped. Modeling traits individually disregards the fact that they are most likely associated due to pleiotropy and shared biological basis, thus providing only a partial, confounded view of genetic effects and phenotypic interactions. In this article we use data from a Multiparent Advanced Generation Inter-Cross (MAGIC) winter wheat population to explore Bayesian networks as a convenient and interpretable framework for the simultaneous modeling of multiple quantitative traits. We show that they are equivalent to multivariate genetic best linear unbiased prediction (GBLUP) and that they are competitive with single-trait elastic net and single-trait GBLUP in predictive performance. Finally, we discuss their relationship with other additive-effects models and their advantages in inference and interpretation. MAGIC populations provide an ideal setting for this kind of investigation because the very low population structure and large sample size result in predictive models with good power and limited confounding due to relatedness. UNDERSTANDING the behavior of complex traits involves modeling a web of interactions among the effects of genes, environmental conditions, and other covariates. Ignoring one or more of these factors may substantially affect the accuracy and the generality of the conclusions that can be drawn from the model (Li et al. 2006; Hartley et al. 2012; Alimi et al. 2013), both in the context of genome-wide association studies (GWAS) and genomic selection (GS). Indeed a lot of attention has been devoted in recent literature to improving traditional additive genetic models, which were originally defined using only allele counts (e.g., Meuwissen et al. 2001), by supplementing them with additional information. Some examples include markerbased kinship coefficients (Speed et al. 2012), spatial heterogeneity and dominance (Finley et al. 2009), and gene expression data (Druka et al. 2008). However, most studies in plant and animal genetics still focus on a single phenotypic trait at a time despite the availability of a set of simultaneously measured traits for each genotyped individual. Models for analyzing multiple traits have been available since Henderson and Quaas (1976) introduced the multivariate extension of the genetic best linear unbiased prediction (GBLUP) models, and have been investigated as recently as Stephens (2013) in the context of GWAS. More recent additions include structural equation models (SEM; Li et al. 2006), a Bayesian extension of seemingly unrelated regression (SUR; Banerjee et al. 2008), the MultiPhen ordinal regression (O’Reilly et al. 2012), and spatial models (Banerjee et al. 2012). In this article we use Bayesian networks (BNs; Pearl, 1988; Koller and Friedman, 2009) to build a multivariate dependency model that accounts for simultaneous associations and interactions among multiple single nucleotide polymorphisms (SNPs) and phenotypic traits. BNs have been applied to the analysis of several kinds of genomic data such as gene expression (Friedman 2004), protein–protein interactions (Jansen et al. 2003; Sachs et al. 2005), pedigree analysis (Lauritzen and Sheehan 2004), and the integration of heterogeneous genetic data (Chang and Mcgeachie 2011). Their modular nature makes them ideal for analyzing large marker profiles. As far as SNPs are concerned, BNs have been used to investigate linkage disequilibrium (LD; Mourad et al. 2011; Morota et al. 2012) and epistasis (Han et al. 2012) and to determine disease susceptibility for anemia (Sebastiani et al. 2005), leukemia (Chang and Mcgeachie, 2011), Copyright © 2014 by the Genetics Society of America doi: 10.1534/genetics.114.165704 Manuscript received April 28, 2014; accepted for publication July 7, 2014 Corresponding author: UCL Genetics Institute (UGI), University College London, Gower St., London WC1E 6BT, United Kingdom. E-mail: [email protected] Genetics, Vol. 198, 129–137 September 2014 129 and hypertension (Malovini et al. 2009). The same BN can simultaneously highlight SNPs potentially involved in determining a trait (e.g., for association purposes) and be used for prediction (e.g., for selection purposes): a network capturing the relationship between genotypes and phenotypes can be used to compute the probability that a new individual with a particular genotype will have the phenotype of interest (Lauritzen and Sheehan 2004; Cowell et al. 2007). Materials and Methods A BN is a probabilistic model in which a directed acyclic graph G is used to define the stochastic dependencies quantified by a probability distribution (Pearl 1988; Koller and Friedman 2009). The variables X = {Xi} under investigation in this context include T traits Xt1 ; . . . ;XtT and S SNPs Xs1 ; . . . ;XsS , each of which is associated with a node in G. The arcs between the nodes represent direct stochastic dependencies and determine how the global distribution of X decomposes into a set of local distributions, PðXÞ 1⁄4 Y PðXijPXiÞ; (1) one for each variable Xi, depending only on its parents PXi . This modular representation can capture direct and indirect associations between SNPs and phenotypes and associations between SNPs due to linkage and population structure. In the spirit of commonly used additive genetic models for quantitative traits (e.g., Meuwissen et al. 2001), we make some further assumptions on the BN: 1. each variable Xi is normally distributed, and X is multivariate normal; 2. stochastic dependencies are assumed to be linear; 3. traits can depend on SNPs (i.e., Xsi/Xtj) but not vice versa (i.e., not Xtj/Xsi), and they can depend on other traits (i.e., Xti/Xtj ; i 61⁄4 j); and 4. SNPs can depend on other SNPs (i.e., Xsi/Xsj ; i 61⁄4 j). We also assume that dependencies between traits broadly follow the temporal order in which they are measured; for instance, traits that are measured when a plant variety is harvested can depend on those that are measured while it is still in the field (and obviously on the markers as well), but not vice versa. In other words, assumptions 3 and 4 define BNs that describe the dependencies of phenotypes on genotypes in a prognostic model, as opposed to a diagnostic model in which genotypes depend on phenotypes. The latter is often preferred over the former because it results in simpler models when the Xi are discrete (Sebastiani and Perls 2008); in that setting, the number of parameters grows exponentially with the number of parents of each node. However, this is not the case here due to assumptions 1 and 2. Under these assumptions, the local distribution PðXti jPXti Þ of each trait is a linear model of the form Xti 1⁄4 mti þPXtibti þ eti 1⁄4 mti þ Xtjbtj þ . . .þ Xtkbtk |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pedigree-Based Analysis in a Multiparental Population of Octoploid Strawberry Reveals QTL Alleles Conferring Resistance to Phytophthora cactorum

Understanding the genetic architecture of traits in breeding programs can be critical for making genetic progress. Important factors include the number of loci controlling a trait, allele frequencies at those loci, and allele effects in breeding germplasm. To this end, multiparental populations offer many advantages for quantitative trait locus (QTL) analyses compared to biparental populations....

متن کامل

Bayesian Modeling of Haplotype Effects in Multiparent Populations

A general Bayesian model, Diploffect, is described for estimating the effects of founder haplotypes at quantitative trait loci (QTL) detected in multiparental genetic populations; such populations include the Collaborative Cross (CC), Heterogeneous Socks (HS), and many others for which local genetic variation is well described by an underlying, usually probabilistically inferred, haplotype mosa...

متن کامل

Multiple quantitative trait analysis using bayesian networks.

Models for genome-wide prediction and association studies usually target a single phenotypic trait. However, in animal and plant genetics it is common to record information on multiple phenotypes for each individual that will be genotyped. Modeling traits individually disregards the fact that they are most likely associated due to pleiotropy and shared biological basis, thus providing only a pa...

متن کامل

Efficiently tracking selection in a multiparental population: the case of earliness in wheat.

Multiparental populations are innovative tools for fine mapping large numbers of loci. Here we explored the application of a wheat Multiparent Advanced Generation Inter-Cross (MAGIC) population for QTL mapping. This population was created by 12 generations of free recombination among 60 founder lines, following modification of the mating system from strict selfing to strict outcrossing using th...

متن کامل

Quantitative Structure-Activity Relationship Study on Thiosemicarbazone Derivatives as Antitubercular agents Using Artificial Neural Network and Multiple Linear Regression

Background and purpose: Nonlinear analysis methods for quantitative structure–activity relationship (QSAR) studies better describe molecular behaviors, than linear analysis. Artificial neural networks are mathematical models and algorithms which imitate the information process and learning of human brain. Some S-alkyl derivatives of thiosemicarbazone are shown to be beneficial in prevention and...

متن کامل

A Bayesian Networks Approach to Reliability Analysis of a Launch Vehicle Liquid Propellant Engine

This paper presents an extension of Bayesian networks (BN) applied to reliability analysis of an open gas generator cycle Liquid propellant engine (OGLE) of launch vehicles. There are several methods for system reliability analysis such as RBD, FTA, FMEA, Markov Chains, and etc. But for complex systems such as LV, they are not all efficiently applicable due to failure dependencies between compo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014